Review Suggested Changes and Make Data Improvements
- To see how Einstein Discovery can improve your data, click ReviewChanges. The number of columns and rows in the dataset are shown so you can verify that they’re correct.
- Review the dataset improvement suggestions. For example, you’re notified if you have a value in a column that rarely occurs, because statistical tests can’t be performed on small samples. If you select the action Do Nothing, these entries are bucketed into a category called “Other.” Or you can choose to delete the row. If the value is spelled incorrectly and you want to fix it, select Find AndReplace.
- If a numeric column contains a non-numeric entry, you can replace the entry, delete it, change the column to a text column, delete the rows, or delete the entire column.
- To view the first few rows of data in the original dataset, click OriginalDataset.
- To view the data after making dataset improvements, click ImprovedDataset.
- To save your changes, click Apply.
- On the screen that appears, you can perform various data preparation tasks. When you make a change, the Create button turns into the Apply button. After you make a change, click Apply.
- Set each field type to Text, Number, or Date. Einstein Discovery makes a best guess based on the values in each column of your dataset, but it’s important to check that they make sense. Set categorical fields that you want to group by to Text. Your measures must be set to Numbers.
- To rename a field, click the field name.
- To delete a field, click the trash icon.
- To view the minimum and maximum values in the imported data, click the gear icon beside a number or date field. You reduce a numeric or date field to a tighter minimum and maximum by clicking the value and editing it.
- To rename categories or view the number of rows in a category, click the gear icon beside a text field. Deselecting the checkbox next to a category removes those rows from your dataset. To select every category, click Add All Values. To not include any category, click Remove All Values. You can also convert null values to zero or only include numbers. To rename a category, click the category name. You can also combine categories by giving them the same name.
- To use a mathematical formula to create a column, click Add DerivedColumn. You can create columns based on the existing columns in your data or an absolute number that you specify. For example, you can calculate the ratio of two columns or take a column that contains the temperature in Celsius and create one that contains the temperature in Fahrenheit.
- To add data from another dataset, clickAdd Column from LookupTable. Specify the other dataset and the matching criteria (key). Select the field that you want to include from the second dataset. In the New Name field, you can enter a different label for the field. To add another field, click AddField. When done, click Submit.
- To create a grouped table, click Create Grouped Table. A grouped table combines (tabulates) multiple rows in the dataset into a single row. For example, a table lists individual purchases by a customer, but you want to know the total amount that the customer spent for the year. So you group the purchases by customer and purchase year before analyzing the data. The data is summarized and tabulated by the variable that you chose to group by. You can also specify how each numeric column in your dataset is tabulated based on the primary Group By variable. Pick a numeric field and select Average, Sum, Min, or Max (or for a Text field, select First to use the value of the first row). For example, if you select Average, it calculates the average of this variable for each distinct value of the Group By. If you select Sum, it calculates the sum of the variable for each distinct value. If you select First, it chooses the value for this column for the first row that matched the group by criterion. If you group individual purchases by each customer, and the original table is ordered by date of purchase, First on the Product Name column returns the first purchase made by the customer in that time period.
- When you have finished making your changes, click Apply and then click Create.